9 research outputs found
A Critical Reexamination of Intra-List Distance and Dispersion
Diversification of recommendation results is a promising approach for coping
with the uncertainty associated with users' information needs. Of particular
importance in diversified recommendation is to define and optimize an
appropriate diversity objective. In this study, we revisit the most popular
diversity objective called intra-list distance (ILD), defined as the average
pairwise distance between selected items, and a similar but lesser known
objective called dispersion, which is the minimum pairwise distance. Owing to
their simplicity and flexibility, ILD and dispersion have been used in a
plethora of diversified recommendation research. Nevertheless, we do not
actually know what kind of items are preferred by them.
We present a critical reexamination of ILD and dispersion from theoretical
and experimental perspectives. Our theoretical results reveal that these
objectives have potential drawbacks: ILD may select duplicate items that are
very close to each other, whereas dispersion may overlook distant item pairs.
As a competitor to ILD and dispersion, we design a diversity objective called
Gaussian ILD, which can interpolate between ILD and dispersion by tuning the
bandwidth parameter. We verify our theoretical results by experimental results
using real-world data and confirm the extreme behavior of ILD and dispersion in
practice.Comment: 10 pages, to appear in 46th International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR 2023
Curse of "Low" Dimensionality in Recommender Systems
Beyond accuracy, there are a variety of aspects to the quality of recommender
systems, such as diversity, fairness, and robustness. We argue that many of the
prevalent problems in recommender systems are partly due to low-dimensionality
of user and item embeddings, particularly when dot-product models, such as
matrix factorization, are used.
In this study, we showcase empirical evidence suggesting the necessity of
sufficient dimensionality for user/item embeddings to achieve diverse, fair,
and robust recommendation. We then present theoretical analyses of the
expressive power of dot-product models. Our theoretical results demonstrate
that the number of possible rankings expressible under dot-product models is
exponentially bounded by the dimension of item factors. We empirically found
that the low-dimensionality contributes to a popularity bias, widening the gap
between the rank positions of popular and long-tail items; we also give a
theoretical justification for this phenomenon.Comment: Accepted by SIGIR'2
Exploration of Unranked Items in Safe Online Learning to Re-Rank
Bandit algorithms for online learning to rank (OLTR) problems often aim to
maximize long-term revenue by utilizing user feedback. From a practical point
of view, however, such algorithms have a high risk of hurting user experience
due to their aggressive exploration. Thus, there has been a rising demand for
safe exploration in recent years. One approach to safe exploration is to
gradually enhance the quality of an original ranking that is already guaranteed
acceptable quality. In this paper, we propose a safe OLTR algorithm that
efficiently exchanges one of the items in the current ranking with an item
outside the ranking (i.e., an unranked item) to perform exploration. We select
an unranked item optimistically to explore based on Kullback-Leibler upper
confidence bounds (KL-UCB) and safely re-rank the items including the selected
one. Through experiments, we demonstrate that the proposed algorithm improves
long-term regret from baselines without any safety violation
Safe Collaborative Filtering
Excellent tail performance is crucial for modern machine learning tasks, such
as algorithmic fairness, class imbalance, and risk-sensitive decision making,
as it ensures the effective handling of challenging samples within a dataset.
Tail performance is also a vital determinant of success for personalised
recommender systems to reduce the risk of losing users with low satisfaction.
This study introduces a "safe" collaborative filtering method that prioritises
recommendation quality for less-satisfied users rather than focusing on the
average performance. Our approach minimises the conditional value at risk
(CVaR), which represents the average risk over the tails of users' loss. To
overcome computational challenges for web-scale recommender systems, we develop
a robust yet practical algorithm that extends the most scalable method,
implicit alternating least squares (iALS). Empirical evaluation on real-world
datasets demonstrates the excellent tail performance of our approach while
maintaining competitive computational efficiency
DMS: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention
There is increasing interest in the use of multimodal data in various web
applications, such as digital advertising and e-commerce. Typical methods for
extracting important information from multimodal data rely on a mid-fusion
architecture that combines the feature representations from multiple encoders.
However, as the number of modalities increases, several potential problems with
the mid-fusion model structure arise, such as an increase in the dimensionality
of the concatenated multimodal features and missing modalities. To address
these problems, we propose a new concept that considers multimodal inputs as a
set of sequences, namely, deep multimodal sequence sets (DMS). Our
set-aware concept consists of three components that capture the relationships
among multiple modalities: (a) a BERT-based encoder to handle the inter- and
intra-order of elements in the sequences, (b) intra-modality residual attention
(IntraMRA) to capture the importance of the elements in a modality, and (c)
inter-modality residual attention (InterMRA) to enhance the importance of
elements with modality-level granularity further. Our concept exhibits
performance that is comparable to or better than the previous set-aware models.
Furthermore, we demonstrate that the visualization of the learned InterMRA and
IntraMRA weights can provide an interpretation of the prediction results
Optimal correction cost for object detection evaluation
Abstract
Mean Average Precision (mAP) is the primary evaluation measure for object detection. Although object detection has a broad range of applications, mAP evaluates detectors in terms of the performance of ranked instance retrieval. Such the assumption for the evaluation task does not suit some downstream tasks. To alleviate the gap between downstream tasks and the evaluation scenario, we propose Optimal Correction Cost (OC-cost), which assesses detection accuracy at image level. OC-cost computes the cost of correcting detections to ground truths as a measure of accuracy. The cost is obtained by solving an optimal transportation problem between the detections and the ground truths. Unlike mAp, OC-cost is designed to penalize false positive and false negative detections properly, and every image in a dataset is treated equally. Our experimental result validates that OCscost has better agreement with human preference than a ranking-based measure, i.e., mAP for a single image. We also show that detectors’ rankings by OC-cost are more consistent on different data splits than mAP. Our goal is not to replace mAP with OC-cost but provide an additional tool to evaluate detectors from another aspect. To help future researchers and developers choose a target measure, we provide a series of experiments to clarify how mAP and OC-cost differ
AxIoU:an axiomatically justified measure for video moment retrieval
Abstract
Evaluation measures have a crucial impact on the direction of research. Therefore, it is of utmost importance to develop appropriate and reliable evaluation measures for new applications where conventional measures are not well suited. Video Moment Retrieval (VMR) is one such application, and the current practice is to use R@K, θ for evaluating VMR systems. However, this measure has two disadvantages. First, it is rank-insensitive: It ignores the rank positions of successfully localised moments in the top-K ranked list by treating the list as a set. Second, it binarizes the Intersection over Union (IoU) of each retrieved video moment using the threshold θ and thereby ignoring fine-grained localisation quality of ranked moments. We propose an alternative measure for evaluating VMR, called Average Max IoU (AxIoU), which is free from the above two problems. We show that AxIoU satisfies two important axioms for VMR evaluation, namely, Invariance against Redundant Moments and Monotonicity with respect to the Best Moment, and also that R@ K, θ satisfies the first axiom only. We also empirically examine how Ax-IoU agrees with R@K, θ, as well as its stability with respect to change in the test data and human-annotated temporal boundaries